Mandarin-English Information (MEI)
نویسندگان
چکیده
Mandarin-English Information (MEI) is one of the four projects selected for the Johns Hopkins University Summer Workshop 2000. We plan to develop technologies for using written queries to search spoken documents (cross-media) between English and Mandarin Chinese (cross-language). Our research focus is on the integration of speech recognition and machine translation technologies in the context of translingual speech retrieval. We plan to work on the problems of: (i) indexing Mandarin Chinese audio with word and subword units, (ii) translating variable-size units for cross-language information retrieval, and (iii) devising effective retrieval strategies for English text queries and Mandarin Chinese news audio.
منابع مشابه
Multi-scale-audio indexing for translingual spoken document retrieval
MEI (Mandarin-English Information) is an English-Chinese crosslingual spoken document retrieval (CL-SDR) system developed during the Johns Hopkins University Summer Workshop 2000. We integrate speech recognition, machine translation, and information retrieval technologies to perform CL-SDR. MEI advocates a multi-scale paradigm, where both Chinese words and subwords (characters and syllables) ar...
متن کاملMulti-scale retrieval in MEI: an English-Chinese translingual speech retrieval system
This paper presents a multi-scale retrieval approach in MEI (Mandarin-English Information), an English-Chinese cross-lingual spoken document retrieval (CL-SDR) system. It accepts an entire English news story (from newspaper text) as the input query, and automatically retrieves "relevant" Mandarin news stories (from broadcast audio). This allows the user to search for personally relevant content...
متن کاملA Cross-Linguistic Study of Voice Onset Time in Stop Consonant Productions
This study examines voice onset time (VOT) for phonetically voiceless word-initial stops in Mandarin Chinese and in English, as spoken by 11 Mandarin speakers and 4 British English speakers. The purpose of this paper is to compare Mandarin and English VOT patterns and to categorize their stop realizations along the VOT continuum. As expected, the findings reveal that voiceless aspirated stops i...
متن کاملChildren’s Knowledge of Disjunction and Universal Quantification in Mandarin Chinese
Downward entailing linguistic environments license inferences from sets to their subsets. These environments also determine the interpretation of disjunction: Disjunction licenses a conjunctive entailment in the scope of downward entailing operators (Crain 2008, 2012). This leads to a striking asymmetry across languages in the interpretation of disjunction when it appears in the restrictor (dow...
متن کاملImproving Language Models for Mandarin Conversational Speech Recognition with Web Data
Lack of data is a problem in training language models for conversational speech recognition, particularly for languages other than English. Experiments in English have successfully used webbased text collection targeted for a conversational style to augment small sets of transcribed speech; here we look at extending these techniques to Mandarin. In addition, we investigate different techniques ...
متن کامل